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1 Introduction 

Parsing is the process of determining the parses of an input string according to a grammar. 
In this chapter we will restrict ourselves to context-free grammars. Parsing is related to 
recognition, which is the process of determining whether an input string is in the language 
described by a grammar or automaton. Most algorithms we will discuss are recognition 
algorithms, but since they can be straightforwardly extended to perform parsing, we will 
not make a sharp distinction here between parsing and recognition algorithms. 

For a given grammar and an input string, there may be very many parses, perhaps 
too many to be enumerated one by one. Significant practical difficulties in computing 
and storing the parses can be avoided by computing individual fragments of these parses 
and storing them in a table. The advantage of this is that one such fragment may be 
shared by many different parses. The methods of tabular parsing that we will investigate 
in this chapter are capable of computing and representing exponentially many parses in 
polynomial time and space, respectively, by means of this idea of sharing of fragments 
between several parses. 

* Supported by the Royal Netherlands Academy of Arts and Sciences. Secondary affiliation is the 
German Research Center for Artificial Intelligence (DFKI). 
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Tabular parsing, invented in the field of computer science in the period roughly be- 
tween 1965 and 1975, also became known later in the field of computational linguistics 
as chart parsing j^S]. Tabular parsing is a form of dynamic programming . A very related 
approach is to apply memoization to functional parsing algorithms 20j. 

What is often overlooked in modern parsing literature is that many techniques of 
tabular parsing can be straightforwardly derived from non-tabular parsing techniques 
expressed by means of push-down automata. A push-down automaton is a device that 
reads input from left to right, while manipulating a stack. Stacks are a very common data 
structure, frequently used wherever there is recursion, such as for the implementation of 
functions and procedures in programming languages, but also for context-free parsing. 

Taking push-down automata as our starting point has several advantages for describing 
tabular parsers. Push-down automata are simpler devices than the tabular parsers that 
can be derived from them. This allows us to get acquainted with simple, non-tabular forms 
of context-free parsing before we move on to tabulation, which can, to a large extent, be 
explained independently from the workings of individual push-down automata. Thereby 
we achieve a separation of concerns. Apart from these presentational advantages, parsers 
can also be implemented more easily with this modular design than without. 

In Section 121 we discuss push-down automata and their relation to context-free gram- 
mars. Tabulation in general is introduced in Section |21 We then discuss a small number 
of specific tabular parsing algorithms that are well-known in the literature, viz. Barley's 
algorithm (Section 0}, the Cocke-Kasami- Younger algorithm (Sectional), and tabular LR 
parsing (Section IHl). Section [7| discusses compact representations of sets of parse trees, 
which can be computed by tabular parsing algorithms. Section |H1 provides further pointers 
to relevant literature. 

2 Push-down automata 

The notion of push-down automaton plays a central role in this chapter. Contrary to 
what we find in some textbooks, our push-down automata do not possess states next to 
stack symbols. This is without loss of generality, since states can be encoded into the 
stack symbols. Thus, a push-down automaton (PDA) ^ is a 5-tuple {E, Q, qinu, qfinai, 
A), where E is an alphabet, i.e., a finite set of input symbols, Q is a finite set of stack 
symbols, including the initial stack symbol Qinu and the final stack symbol qfinai, and A is 
a finite set of transitions. 

A transition has the form ai ^—>-o'2, where (Xi, (72 G Q* and v & E*. Such a transition 
can be applied if the stack symbols ai are found to be the top-most few symbols on the 
stack and the input symbols v are the first few symbols of the unread part of the input. 
After application of such a transition, ai has been replaced by (T2, and the next \v\ input 
symbols are henceforth treated as having been read. 

More precisely, for a fixed PDA and a fixed input string w = ai ■ ■ ■ E E* , n > 0, we 
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go ^ go qi q2 ^ q2 q^ q^ q^ ^ q& qs qe ^ qs 

qo qi ^ qo q2 ^ 93 q^ 52 qe qi qo qs ^ qg 

qo qi ^ qo qz ^ ^4 gs ^0 qr ^ qg 

Figure 1: Transitions of an example PDA. 
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Figure 2: Two sequences of configurations, leading to recognition of the string abed. 

define a configuration as a pair (cr, i) consisting of a stack cr E Q* and an input position 
i, < i < n. The mj)nt position indicates how many of the symbols from the input have 
already been read. Thereby, position and position n indicate the beginning and the end, 
respectively, of w. We define the binary relation h on configurations by: {cr,i) h {cr',j) 
if and only if there is some transition ai (T2 such that a = a^ai and a' = 0^,02., some 
(T3 G Q*, and v = cii+iaj+2 ' " " '^j- Here we assume i < j, and if i = j then v = e, where e 
denotes the empty string. Note that in our notation, stacks grow from left to right, i.e., 
the top-most stack symbol will be found at the right end. 

We denote the reflexive and transitive closure of h by h*; in other words, (cr, i) h* 
means that we may obtain configuration (o"', j) from (cr, i) by applying zero or 
more transitions. We say that the PDA recognizes a string w = ai ■ ■ ■ a„ if {qinit,0) l~* 
{qfinah '^)- This means that we start with a stack containing only the initial stack symbol, 
and the input position is initially 0, and recognition is achieved if we succeed in reading 
the complete input, up to the last position ra, while the stack contains only the final stack 
symbol. The language accepted by a PDA is the set of all strings that it recognizes. 

As an example, consider the PDA with E = {a, b, c, d}, Q = {qo, ■ ■ ■ , qg}, qmit = qo, 
qfinai = qg, and the set A of transitions given in Figure^ There are two ways of recognizing 
the input string w = 01020304 = abed, indicated by the two sequences of configurations 
in Figure 121 

We say a PDA is deterministic if for each configuration there can be at most one 
applicable transition. The example PDA above is clearly nondeterministic due to the two 
transitions go qi ^ qo q2 and go qi ^ qo qs- 

A context-free grammar (CFG) ^ is a 4-tuple {IJ,N,S,R), where E is an alphabet. 
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i.e., a finite set of terminals, is a finite set of nonterminals, including tlie start symbol 
S, and -R is a finite set of rules, eacfi of tlie form A a with A ^ N and a G (17 U A^)*. 
Tlie usual 'derives' relation is denoted by and its reflexive and transitive closure by 
The language generated by a CFG is the set {w \ S w}. 

In practice, a PDA is not hand-written, but is automatically obtained from a CFG, 
by a mapping that preserves the generated/accepted language. Particular mappings from 
CFGs to PDAs can be seen as formalizations of parsing strategies. 

We define the size of a PDA as Y], v |o"if o"2l, i-e., the total number of occurrences 
of stack symbols and input symbols in the set of transitions. Similarly, we define the 
size of a CFG as J2(A^a)eR I^Q^I; i-e-; the total number of occurrences of terminals and 
nonterminals in the set of rules. 

3 Tabulation 

In this section, we will restrict the allowable transitions to those of the types qi A q2, 
Qi <l2 ^ Qi 13, and qi q2 ^ qs, where qi,q2,q3 £ Q and a E U. The reason is that this 
allows a very simple form of tabulation, based on work by IB] . In later sections, we 
will again consider less restrictive types of transitions. Note that each of the transitions 
in Figure H is of one of the three types above. 

The two sequences of configurations in Figure |21 share a common step, viz. the appli- 
cation of transition q^ q^ q^ at input position 3 when the top-of-stack is q^. In this 
section we will show how we can avoid doing this step twice. Although the savings in 
time and space for this toy example are negligible, in realistic examples we can reduce 
the costs from exponential to polynomial, as we will see later. 

A central observation is that if two configurations share the same top-of-stack and the 
same input position, then the sequences of steps we can perform on them are identical 
as long as we do not access lower regions of the stack that differ between these two 
configurations. This implies for example that in order to determine which transition(s) of 
the form qi A q^ q2 to apply, we only need to know the top-of-stack gi, and the current 
input position so that we can check whether a is the next unread symbol from the input. 

These considerations lead us to propose a representation of sets of configurations as 
graphs. The set of vertices is partitioned into subsets, one for each input position, and 
each such subset contains at most one vertex for each stack symbol. This last condition 
is what will allow us to share steps between different configurations. 

We also need arcs in the graph to connect the stack symbols. This is necessary when 
transitions of the form qi q2 ^-^ qi qs or qi q2 qs are applied, since these require access 
to deeper regions of a stack than just its top symbol. The graph will contain an arc from 
a vertex representing stack symbol q at position i to a vertex representing stack symbol 
q' at position j < i, if g' resulted as the topmost stack symbol at input position j, and 
q can be immediately on top of that q' at position i. If we take a path from a vertex 
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Figure 3: The collection of all derivable configurations represented as graph. For each 
input position there is a subset of vertices. For each such subset, there is at most one 
vertex for each stack symbol. 

in the subset of vertices for position i, and follow arcs until we cannot go any further, 
encountering stack symbols qi, qm, in this order, then this means that (gmit,0) h* 
{qm---qi,i). 

For the running example, the graph after completion of the parsing process is given in 
FigureEl One detail we have not yet mentioned is that we need an imaginary stack symbol 
_L, which we assume occurs below the actual bottom-of-stack. We need this symbol to 
represent stacks consisting of a single symbol. Note that the path from the vertex labelled 
qg in the subset for position 4 to the vertex labelled ± means that (go, 0) h* (gg, 4), which 
implies the input is recognized. 

What we still need to explain is how we can construct the graph, for a given PDA and 
input string. Let w = ai - ■ ■ an, n > 0, be an input string. In the algorithm that follows, 
we will manipulate 4-tuples {j,q',i,q), where q',q E Q and j,i are input positions with 
< J < i < These 4-tuples will be called items. Item (j, q', i, q) means that there 
is an arc in the graph from a vertex representing q at position i to a vertex representing 
q' at position j. Formally, it means that for some a we have (gmit, 0) h* (a g',j) and 
{(^ l' ij) I"* (c^ q' Qi^)-! where in the latter relation the transitions that are involved do not 
access any symbols internal to a. 

The algorithm is given in Figure ^ Initially, we let the set T contain only the item 
(_L, 0, qinit, 0), representing one arc in the graph. We then incrementally fill T with more 
items, representing more arcs in the graph, until the complete graph has been constructed. 
In this particular tabular algorithm, we process the symbols from the input one by one, 
from left to right, applying all transitions as far as we can before moving on to the next 
input symbol. Whereas T contains all items that have been derived up to a certain point. 
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1. Let r = {(±,0 0)}. 

2. For i = 1, . . . ,n do: 

(a) Let TV = 0. 

(b) For each {q',j,qi,i — 1) E T and each transition qi ^ qi q2 such that {qi,i — 
1; q2, i) ^ T , add {qi,i — 1, g2, to T and to Af. 

(c) For each (§1,^,^2,^ — 1) G T and each transition gi q2 ^ qi qs such that 
{qi,j,q3,i) i ^, add iqi,j,q3,i) to T and to M. 

(d) As long as A/" 7^ do: 

i. Remove some {qi,j,q2,i) from A/". 

ii. For each (g', k, qi,j) G T and each transition qi q2 ^ qs and (g', /c, gs, i) ^ 
T, add (g', k, gs, i) to T and to Af. 

3. Recognize the input if (±, 0, g/;„a;, G 

Figure 4: Tabular algorithm to find the collection of all derivable configurations for input 
Oi ■ ■ ■ a„, in the form of a set T of items. 

the set Af contains only those items from T that still need to be combined with others in 
order to (possibly) obtain new items. The set T will henceforth be called the table and 
the set Af the agenda. 

Let us analyze the worst-case time complexity of the algorithm in Figure EJ We assume 
that the table T is implemented as a square array of size n+1, indexed by input positions 
i and j, and that each item can be stored in and retrieved from T in time 0{1). The 
agenda Af can be implemented as a stack. Let us consider Step 2(d). A single application 
of this step takes time 0{1). Since each such application is uniquely identified by a 
transition gi g2 t— > gs, a stack symbol g' and the three input positions i, j and k, the 
number of possible applications of the step is 0(|^| \Q\ n^), which for our PDAs can be 
rewritten as (9(|^| |(5| ?t-^). It is not difficult to see that this quantity also dominates the 
worst-case time complexity of our algorithm, which is thereby polynomial both in the size 
of the PDA and in the length of the input string. A similar analysis shows that the space 
complexity of the algorithm is 0{\Qf n^). 

Although the use of the agenda in the algorithm from Figure |3] allows a fairly straight- 
forward implementation, it obscures somewhat how items are derived from other items. 
This can be described more clearly by abstracting away from certain details of the algo- 
rithm, such as the order in which items are added to T. This can be achieved by means of 
a deduction system [30j.^ Such a system contains a set of inference rules, each consisting 
of a list of antecedents , which stand for items that we have already established to be in 

^The earliest mention of abstract specifications of parsing algorithms may be due to |H]. See also [M\ . 
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qi qi qs 



{q'J, qi,i-l) r . ■ ^ .^ 

7 — ^ — -^ ^1 qi^qiq2 [qi,j,q2,i) r 

{qu^-l,q2,^) ^ ^^^^ ^ ^ 



Figure 5: Tabular parsing algorithm in the form of a deduction system. 

{q'J, qui) r e {quj,q2,i) r 

: — \ qi^ qi q2 -7 : ^ \ qi q2 ^ qi qs 

{quhq2,t) ^ (gi,j,g3,«) ^ 

Figure 6: Two additional inference rules for transitions of the form qi A qi and 

qi q2 ^ qi qs- 



T, and, below a horizontal line, the consequent, which stands for an item that we derive 
from the antecedents and that is added to T unless it is already present. At the right of 
an inference rule, we may also write a number of side conditions , which indicate when 
rules may be applied, on the basis of transitions of the PDA. 

A deduction system equivalent to the algorithm from Figure E] is given in Figure 
In Figure El (0,^0? l^^i) is derived from (±,0,go;0) by means of go ^ qo qi, « being oi; 
(0, go, 2, g2) is derived from (0, go, 1, gi) by means of go gi ^^ go ^2, b being 02; (0, go, 4, q^) 
is derived from (0, go, 2, g2) and (2, g2, 4, gg) by means of g2 ge qj- 

We may now extend our repertoire of transitions by those of the forms gi gi g2 and 
qi q2 <?! qs, which only requires two additional inference rules, indicated in Figure IHl To 
extend the algorithm in Figure E] to handle these additional types of transitions requires 
more effort. Up to now, all items (g, j, g', i), with the exception of (±, 0, qinu, 0), were such 
that j < i. If we had an item {qi,j,q2,'i) in the agenda A/" and were looking for items 
(g', k, gi, j) in T, in order to apply a transition gi g2 y—>- gs, then we could be sure that we 
had access to all items (g', k, qi,j) that would ever be added to T. This is because j < i, 
and all items having j as second input position had been found at an earlier iteration of 
the algorithm. 

However, if we add transitions of the form gi gi g2 and gi g2 ^ qi qs, we may obtain 
items of the form {q,j,q',i) with j = i. It may then happen that an item {q',k,qi,j) is 
added to T after the item (gi, j, g2, i) is taken from the agenda J\f and processed. To avoid 
that we overlook any computation of the PDA, we must change the algorithm to take into 
account that an item taken from the agenda may be of the form (g', k, gi, j), and we then 
need to find items of the form {qi,j,q2,'i) already in the table, with j = i, in order to 
apply a transition gi q2 ^ qs- We leave it to the reader to determine the precise changes 
this requires to Figure EJ and to verify that it is possible to implement these changes in 
such a way that the order of the time and space complexity remains unchanged. 
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4 Earley's algorithm 



In this section we will investigate the top-down parsing strategy, and discuss tabulation 
of the resulting PDAs. Let us fix a CFG Q = {E, N, S, R) and let us assume that there 
is only one rule in R of the form S a. The stack symbols of the PDA that we will 
construct are the so called dotted rules, defined as symbols of the form A ^ a • (3 where 
A —>■ a(3 is a rule from R\ in words, a stack symbol is a rule in which a dot has been 
inserted somewhere in the right-hand side. Intuitively, the dot separates the grammar 
symbols that have already been found to derive substrings of the read input from those 
that are still to be processed. We will sometimes enclose dotted rules in round brackets 
to enhance readability. 

The alphabet of the PDA is the same as that of the CFG. The initial stack symbol is 
S ^ • a, the final stack symbol is S* — > a •, and the transitions are: 

1. (A ^ a • 5/?) ^^ (A ^ a • Bp) (5 ^ • 7) for all rules A aBp and 5 ^ 7; 

2. {A a • b(3) {A ab • f3) for each rule A ah(3, where b G E; 

3. {A^a» Bp) (B ^ J •) ^ {A ^ aB • p) for all rules A aBp and 5^7. 

Given a stack symbol A a • XP, with X ^ E U N, the indicated occurrence of 
X will here be called the goal. The goal in the top-of-stack is the symbol that must be 
matched against the next few unread input symbols. Transitions of type 1 above predict 
rules with nonterminal B in the left-hand side, when B is the goal in the top-of-stack. 
Transitions of type 2 move the dot over terminal goal b in the top-of-stack, if that b 
matches the next unread input symbol. Finally, transitions of type 3 combine the top- 
most two stack symbols, when the top-of-stack indicates that the analysis of a rule with 
B in the left-hand side has been completed. The current top-of-stack is removed, and in 
the new top-of-stack, the dot is moved over the goal B. 

Since the types of transition above are covered by what we discussed in Section |21 we 
may apply a subset of the inference rules from Figures |S1 and IHl to obtain a tabular parsing 
algorithm for the top-down strategy. This will result in items of the form 

{A^a*Bp,j,B -^^.6,1). 

However, it can be easily verified that if there is such an item in the table, and if some 
stack symbol A' ^ a' • BP' may occur on top of the stack at position j, then at some 
point, the table will also contain the item 

{A'-.a'.BP',j,B^j.S,z). 

An implication of this is that the first component A ^ a • BP of an item represents re- 
dundant information, and may be removed without affecting the correctness of the tabular 
algorithm. (See [2Sl Section 1.2.2] for the exact conditions that justify this simplification.) 
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Figure 7: Tabular top-down parsing, or Barley's algorithm. 
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Figure 8: Table T obtained by Farley's algorithm, represented as upper triangular matrix. 

After this simplification, we obtain the deduction system in Figure which can be seen 
as a specialized form of the tabular algorithm from the previous section. It is also known 
as Farley's algorithm 0I21E]- Step is called initializer, (0) is called predictor, is 
called scanner, and (0} is called completer. 

As an example, consider the CFG with E = {a, *,+}, = {S,E} and with rules 
S—>-E,E^E*E,E—*E + E and E a, and consider the input string w = a + a* a. 
Now that items are 3-tuples, it is more convenient to represent the table T as an upper 
triangular matrix rather than a graph, as exemplified by Figure |H1 This matrix consists 
of sets Tij, i < j, such that {A —>■ a • P) E %j if and only if (i, A — > a • [3,]) G T. 
The string w is recognized since the final stack symbol S ^ E • is found in 7o,5. Observe 
that (0, S" — s> •, 5) can be derived from (0, S* — • E, 0) and {Q, E ^ E * E or from 
(0, ^ • E, 0) and {Q, E ^ E + E •,'^). This indicates that w is ambiguous. 

It can be easily verified that Farley's algorithm adds an item (j, A a • (3,i) io T H 
and only if: 
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1. S Oi ■ ■ ■ ajA'j, for some 7, and 

2. a Oj+i • • • ttj. 

In words, the existence of such an item in the table means that there is a derivation from 
the start symbol S that reaches A, the part of that derivation to the left of that occurrence 
of A derives the input from position up to position j, and the prefix a of the right-hand 
side of rule A — > a/3 derives the input from position j up to position i. 

The tabular algorithm of Figure d runs in time 0{\g\ n^) and space 0{\Q\n'^), for a 
CFG Q and for an input string of length n. Both upper bounds can be easily derived from 
the general complexity results discussed in Sectional taking into account the simplification 
of items to 3-tuples. 

To obtain a formulation of Farley's algorithm closer to a practical implementation, 
such as that in Figure El read the remarks at the end of Section IHl concerning the agenda 
and transitions that read the empty string. Alternatively, one may also preprocess certain 
steps to avoid some of the problems with the agenda during parse time, as discussed 
by ^21, who also showed that the worst-case time complexity of Farley's algorithm can 
be improved to n^). 

5 The Cocke-Kasami- Younger algorithm 

Another parsing strategy is (pure) bottom-up parsing, which is also called shift-reduce 
parsing [22] • It is particularly simple if the CFG Q = {S, N, S, R) is in Chomsky normal 
form, which means that each rule is either of the form A ^ a, where a E E, or of the 
form A ^ B C , where B,C G A^. The set of stack symbols is the set of nonterminals of 
the grammar, and the transitions are: 

1. £ A A for each rule A — >■ a; 

2. 5 C A A for each rule A^ B C. 

A transition of type 1 consumes the next unread input symbol, and pushes on the stack 
the nonterminal in the left-hand side of a corresponding rule. A transition of type 2 can 
be applied if the top-most two stack symbols B and C are such that B C is the right-hand 
side of a rule, and it replaces B and C by the left-hand side A of that rule. Transitions of 
types 1 and 2 are called shift and reduce, respectively; see also Sectional The final stack 
symbol is S. We deviate from the other sections in this chapter however by assuming 
that the PDA starts with an empty stack, or alternatively, that there is some imaginary 
initial stack symbol that is not in A^. 

The transitions B C ^ A are of a type that we have seen before, and in a tabular 
algorithm for the PDA, such transitions can be realized by the inference rule: 
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1. Let r = 0. 



2. For z = 1, . . . , n do: 

(a) For each rule A — >• Oi, add (i — 1, A, i) to T. 

(b) For = i — 2, . . . , and j = k + 1, . . . ,i — 1 do: 

• For each rule A B C and all {k, B,j), {j, C, i) G T, add (A;, A, i) to T. 

3. Recognize the input if (0, S', n) G T. 



Here we use 3-tuples for items, since the first components of the general 4-tuples are 
redundant, just as in the case of Farley's algorithm above. Transitions of the type e A A 
are new, but they are similar to transitions of the familiar form B B A, where B can 
be any stack symbol. Because B is irrelevant for deciding whether such a transition can 
be applied, the expected inference rule 



A formulation of the tabular bottom-up algorithm closer to a typical implementation 
is given in Figure IHl This algorithm is also known as the Cocke-Kasami- Younger (CKY) 
algorithm [HH^I. Note that no agenda is needed. It can be easily verified that the CKY 
algorithm adds an item {j, A, i) to T if and only if A ^* Oj+i ■ ■ ■ Oj. 

As an example, consider the CFG with U = {a,b}, N = {S,A} and with rules 
5* SS, S — > AA, S ^ b, A ^ AS, A AA and A ^ a, and consider the input string 
w = aabb. The table T produced by the CKY algorithm is given in Figure ITIH represented 
as an upper triangular matrix. (Note that the sets T^^j, < i < n, on the diagonal of the 
matrix are always empty and are therefore omitted.) The string w is recognized since the 
final stack symbol S is found in 7^ 4. 

For a CFG Q = {E, N, S, R) in Chomsky normal form and an input string of length 
n, the tabular algorithm of Figure IHl runs in time 0(|-R| n^) and space 0(|A^| n^)- Again, 



Figure 9: Tabular bottom- up parsing, or the CKY algorithm. 
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Figure 10: Table T obtained by the CKY algorithm. 



these upper bounds can be easily derived from the general complexity results discussed in 
Section ini taking into account the simplification of items to 3-tuples. Note that the CKY 
algorithm runs in time proportional to the size of the grammar, since \Q\ = 0{\R\) for 
CFGs in Chomsky normal form. However, known transformations to Chomsky normal 
form may increase the size of the grammar by a square function [13j . 

6 Tabular LR parsing 

A more complex parsing strategy is LR parsing jTHl IHH|- Its main importance is that it 
results in deterministic PDAs for many practical CFGs for programming languages. For 
CFGs used in natural language systems however, the resulting PDAs are typically non- 
deterministic. Although in this case the advantages over simpler parsing strategies have 
never been convincingly shown, the frequent treatment of nondeterministic LR parsing in 
recent literature warrants its discussion here. 

A distinctive feature of LR parsing is that commitment to a certain rule is postponed 
until all grammar symbols in the right-hand side of that rule have been found to generate 
appropriate substrings of the input. In particular, different rules for which this has not yet 
been accomplished are processed simultaneously, without spending computational effort 
on any rule individually. As in the case of Farley's algorithm, we need dotted rules of 
the form A ^ a • (3, where the dot separates the grammar symbols in the right-hand 
side that have already been found to derive substrings of the read input from those that 
are still to be processed. Whereas in the scanner step (jHI) and in the completer step (jH) 
from Farley's algorithm (Figure Hj) each rule is individually processed by letting the dot 
traverse its right-hand side, in LR parsing this traversal simultaneously affects sets of 
dotted rules. Also the equivalent of the predictor step from Farley's algorithm is 
now an operation on sets of dotted rules. These operations are pre-compiled into stack 
symbols and transitions. 

Let us fix a CFG Q = {S, N, S, R). Assume g is a set of dotted rules. We define 
closure{q) as the smallest set of dotted rules such that: 

1. g C closure{q), and 

2. if {A ^ a • Bf3) G closure{q) and (5 ^ 7) G R, then (5 — * • 7) G closure{q). 
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In words, we extend the set of dotted rules by those that can be obtained by repeatedly 
applying an operation similar to the predictor step. For a set q of dotted rules and a 
grammar symbol X G 17 U A^, we define: 

goto{q,X) = closure{{{A aX • P) \ {A a • X P) e q}) 

The manner in which the dot traverses through right-hand sides can be related to the 
scanner step of Barley's algorithm if X G i7 or to the completer step if X G X. 

The initial stack symbol qmu is defined to be closure{{{S | {S ^ a) E R}); 

cf. the initializer step (Q) of Barley's algorithm. Other stack symbols are those non-empty 
sets of dotted rules that can be derived from qinu by means of repeated application of the 
goto function. More precisely, Q is the smallest set such that: 

1- qinit G Q, and 

2. if g G Q and goto{q, X) = q' ^ ^ for some X, then q' G Q. 

For technical reasons, we also need to add a special stack symbol qfinai to Q that becomes 
the final stack symbol. The transitions are: 

1. qi ^ qi q2 for all qi,q2 ^ Q and each a E S such that goto{qi, a) = q2', 

2. qo qi ■ ■ ■ qm ^ % Q.' for all q^, . . . , q^, q' & Q and each (A — > a •) G such that 
\a\ = m and q' = goto{qo, A); 

^- qo Qi ■ ■ ■ Qm ^ Qfinai foT all go, • • • , ^ Q and each (S* ^ a •) G g^ such that 
|a| = m and go = qinit- 

The first type of transition is called shift. It can be seen as the pre-compilation of the 
scanner step followed by repeated application of the predictor step. Note that only one 
transition is applied for each input symbol that is read, independent of the number of 
dotted rules in the sets gi and g2. The second type of transition is called reduction. It can 
be applied when the symbol on top of the stack contains a dotted rule with the dot at 
the end of the right-hand side. First, as many symbols are popped from the stack as that 
right-hand side is long, and then a symbol g' = goto{qo,A) is pushed on the stack. This 
is related to the completer step from Barley's algorithm. The third type of transition is 
very similar to the second. It is only applied once, when the start symbol has been found 
to generate (a prefix of) the input. 

For tabular LR parsing, we apply the same framework as in the previous sections, to 
obtain Figure [TTl A slight difficulty is caused by the new types of transition go ■ ■ ■ qm ^ 
go q' and go ■ ■ ■ qm ^ qfinai, but these can be handled by a straightforward generalization 
of inference rules from Figures El and IHl Note that we need 4-tuple items here rather 
than 3-tuple items as in the previous two sections. Tabular LR parsing is also known 
as generalized LR parsing (SHI EI] • In the hterature on generalized LR parsing, but only 
there, the table T of items is often called a graph- structured stack. 
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Figure 11: Tabular LR parsing, or generalized LR parsing 
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Figure 12: The set of stack symbols, excluding qjinai, and the goto function. 

As an example, consider the grammar with the rules S ^ S + S and S —>■ a. Apart 
from qfinai, the stack symbols of the PDA are represented in Figure [l2l as rectangles 
enclosing sets of dotted rules. There is an arc from stack symbol q to stack symbol q' 
labelled by X to denote that goto{q,X) = q'. For the input a + a + a, the table T is 
given by Figure [121 Note that gs at position 4 has two outgoing arcs, since it can arise 
by a shift with + from q^ or from g2- Also note that {-L,0,qfinai,5) is found twice, once 
from (±,0,gi„it,0), {qinit,0,q2,l), (^2,1,^3,2), (g3,2,g4,5), and once from (±, 0, g^^t, 0), 
{qimt,0,q2,S), (^2,3,^3,4), (g3,4,g4,5), in both cases by means of {S S + S •) E q^, 
with IS" + S"! =3. This indicates that the input a + a + a is ambiguous. 

If the grammar at hand does not contain rules of the form A —>■ e, then the tabular 
algorithm from Figure can be reformulated in a way very similar to the algorithm from 
Figure |3] If there are rules of the form A ^ e however, the handling of the agenda is 
complicated, due to problems similar to those we discussed at the end of Section 121 This 
issue is investigated by j2Zl Ell • 

We now analyze the time and space complexity of tabular LR parsing. Let us fix a 
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Figure 13: Table T obtained by tabular LR parsing. 

CFG Q = {S, N, S, R). Let p be the length of the longest right-hand side of a rule in R and 
let n be the length of the input string. Once again, we assume that T is implemented as a 
square array of size n + 1. Consider the reduction step (|7j) in Figure ITTl Each application 
of this step is uniquely identified by m + 1 < p+1 input positions and \Q\ \R\ combinations 
of stack symbols. The expression \Q\ \R\ is due to the fact that, once a stack symbol go 
and a rule A — > X1X2 ■ ■ ■ have been selected such that {A ^ • X1X2 ■ ■ ■ X^) G qo, 
then stack symbols Qi, I < i < m, and q' are uniquely determined by qi = goto{qo, Xi), 
q2 = goto{qi,X2), . . ., q-m = 9oto{qm-i, Xm) and q' = goto{qo,A). (As can be easily 
verified, a derivable stack of which the top-most symbol qm contains {A —* X1X2 ■ ■ ■ Xm •) 
must necessarily have top- most symbols go9i ■ ■ ■ 9m with the above constraints.) Since a 
single application of this step can easily be carried out in time 0{p), we conclude the 
total amount of time required by all applications of the step is \R\pnP~^^). This is 

also the worst-case time complexity of the algorithm, since the running time is dominated 
by the reduction step ((Tj). From the general complexity results discussed in Section |3] it 
follows that the worst-case space complexity is 0{\Qf n"^). 

We observe that while the above time bound is polynomial in the length of the input 
string, it can be much worse than the corresponding bounds for Farley's algorithm or 
for the CKY algorithm, since p is not bounded. A solution to this problem has been 
discussed by [1311231 and consists in splitting each reduction into 0{p) transitions of the 
form q'q" q. In this way, the maximum length of transitions becomes independent 
of the grammar. This results in tabular implementations of LR parsing with cubic time 
complexity in the length of the input. We furthermore observe that the term \Q\ in the 
above bounds depends on the specific structure of Q, and may grow exponentially with 
1^1 Proposition 6.46]. 
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7 Parse trees 



As stated in Section ^ recognition is the process of determining whether an input string 
is in the language described by a grammar or automaton, and parsing is the process of 
determining the parse trees of an input string according to a grammar. Ahhough the 
algorithms we have discussed up to now are recognition algorithms, they can be easily 
extended to become parsing algorithms, as we show in this section. In what follows we 
assume a fixed CFG Q = {U, N, S, R) and an input string = ai ■ ■ ■ a„ G 17*. 

Since the number of parse trees can be exponential in the length of the input string, 
and even infinite when Q is cyclic, one first needs to find a way to compactly represent the 
set of all parse trees. This is usually done through a CFG Q^, called parse forest, defined as 
follows. The alphabet of Qu, is the same as that of Q, and the nonterminals of have the 
form (j. A, i), where A E N and < j < i < n. The start symbol of Qw is (0, S, n). The 
rules of Qyj include at least those of the form (io,^, "^m) (^0)-^i)^i) " " " (^m-i, -^m, "^m), 
where (i) {A — > Xi---Xm) € -R, (ii) S Oi ■ ■ ■ ajgAaj^+i ■ ■ ■ a^, and (iii) Xj 
fli _i+i ■ ■ ■ for 1 < j < m, and those of the form {i — 1, Oj, i) — > a^. However, may 
also contain rules (zo,^,'^m) {io, Xi,ii) ■ ■ ■ {im-i, Xm^im) that violate constraints (ii) 
or (iii) above. Such rules cannot be part of any derivation of a terminal string from 
(0, S", n) and can be eliminated by a process that is called reduction. Reduction can be 
carried out in linear time in the size of Qw [32j. 

It is not difficult to show that the parse forest Qw generates a finite language, which 
is either {w} if w is in the language generated by or otherwise. Furthermore, there 
is a one-to-one correspondence between parse trees according to and parse trees of w 
according to with corresponding parse trees being isomorphic. 

To give a concrete example, let us consider the CKY algorithm presented in Sectional 
In order to extend this recognition algorithm to a parsing algorithm, we may construct the 
parse forest with rules of the form (j. A, i) (j, B, k) {k, C, i), where {A ^ B C) E R 
and {j,B,k),{k,C,i) e T, rules of the form {i — l,A,i) —^{i — l,aj,z), where {A 
Qi) G R, and rules of the form {i — l,ai,i) — > a^. Such rules can be constructed during 
the computation of the table T. In order to perform reduction on Q^, one may visit 
the nonterminals of Qw starting from (0, S,n), following the rules in a top-down fashion, 
eliminating the nonterminals and the associated rules that are never reached. From the 
resulting parse forest Q^, individual parse trees can be extracted in time proportional to 
the size of the parse tree itself, which in the case of CFGs in Chomsky normal form is 
0{n). One may also extract parse trees directly from table T, but the time complexity 
then becomes 0{\g\n^) [21 HH- 

Consider the table T from Figure ITUl which was produced by the CKY algorithm with 

w = aabb and Q = (17, A^, S, R), where U = {a, b}, N = {S, A} and i? = {S* — > SS, S — > 
AA, S b,A ^ AS, A AA,A — > a}. The method presented above constructs the 
parse forest Q^, = {S, N^,, (0, S, 4), R^), where Ny, C {{j,B,i) \ B e N, < j < i < 4} 
and Ry^ contains the rules in Figure IT^ Rules that are eliminated by reduction are marked 
by f. 
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Figure 14: Parse forest 
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with table T from Figure ^1 



If Q is in Chomsky normal form, then we have = 0{\Q\n^). For general CFGs, 
however, we have |^^| = 0(|^| n^"*"^), where p is the length of the longest right-hand side 
of a rule in ^. In practical parsing applications this higher space complexity is usually 
avoided by applying the following method, which is based on fHlini- In place of computing 
Qui, one constructs an alternative CFG containing rules of the form t — > ti ■ ■ -tm, where 
t,ti, . . . ,tm € T such that item t was derived from items ti, . . . ,tm via an inference rule 
with m antecedents. Parse trees according to this new CFG can be extracted as usual. 
From these trees, the desired parse trees for w according to Q can be easily obtained by 
elementary tree editing operations such as node relabelling and node erasing. The precise 
editing algorithm that should be applied depends on the deduction system underlying the 
adopted recognition algorithm. 

If the adopted recognition algorithm has inference rules with no more than m = 2 
antecedents, then the space complexity of the parsing method discussed above, expressed 
as a function of the length n of the input string, is 0{n^). Note that m = 2 in the 
case of Farley's algorithm, and this also holds in practical implementations of tabular LR 
parsing, as discussed at the end of Section IHl The space complexity in the size of Q may 
be larger than (9(|^|), however; it is (9(|^|^) in the case of Farley's algorithm and even 
exponential in the case of tabular LR parsing. 

The parse forest representation is originally due to with states of a finite automaton 
in place of positions in an input string. Parse forests have also been discussed by [71 123 
EniEl- Similar ideas were proposed for tree-adjoining grammars by IT7]. 
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8 Further references 



In this chapter we have restricted ourselves to tabulation for context-free parsing, on 
the basis of PDAs. A similar kind of tabulation was also developed for tree-adjoining 
grammars on the basis of an extended type of PDA Tabulation for an even more 
general type of PDA was discussed by jlUj . 

A further restriction we have made is that the input to the parser must be a string. 
Context-free parsing can however be generalized to input consisting of a finite automaton. 
Finite automata without cycles used in speech recognition systems are also referred to as 
word graphs or word lattices [4 . The parsing methods developed in this chapter can be 
easily adapted to parsing of finite automata, by manipulating states of an input automaton 
in place of positions in an input string. This technique can be traced back to which 
we mentioned before in Section [7| 

PDAs are usually considered to read input from left to right, and the forms of tabula- 
tion that we discussed follow that directionality.^ For types of tabular parsing that are not 
strictly in one direction, such as head-driven parsing Elj and island-driven parsing [2H], 
it is less appealing to take PDAs as starting point. 

Farley's algorithm and the CKY algorithm run in cubic time in the length of the 
input string. An asymptotically faster method for context-free parsing has been developed 
by [SHI, using a reduction from context-free recognition to Boolean matrix multiplication. 
An inverse reduction from Boolean matrix multiplication to context-free recognition has 
been presented by jT31 , providing evidence that asymptotically faster methods for context- 
free recognition might not be of practical interest. 

The extension of tabular parsing with weights or probabilities has been considered 
by [221 Farley's algorithm, by [SI] for the CKY algorithm, and by ^H] for tabular LR 
parsing. Deduction systems for parsing extended with weights are discussed by [10| . 
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